1,087 research outputs found
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
Showing the scars
This short presentation examines instances of literary hypertexts intentionally stripped of that which makes them interconnected and updatable. To investigate aspects of how and why text creators, users, and intermediaries de-enhance hypertexts for reasons entirely distinct from the much-studied antipathy to hypertextuality found in some 20th century literary cultures, it contrasts one commercial and one non-commercial (indeed, actively anti-commercial) example: the mass phenomenon of Kindle Direct Publishing and the niche practice of fan binding. Fan bindings, where fanfiction and other fan works are printed and bound as material objects, sometimes using Print on Demand (POD) services but more often by hand, circulate in a gift economy with distinctive ethical norms and, as transformative works in their own right, illustrate how meaning is made as well as lost in uncoupling works from their fan community contexts. Juxtaposing these examples problematises conceptions of either commercial self-publishing or non-commercial fan communities as offering uncomplicated refuge for interactive literature, and challenges narratives of literary communities as en-duringly hostile to or no longer interested in experimentation with hypertextuality. The presentation addresses the conference topics of authorship and reading practices from a book history perspective, highlighting the wider significance of stances against hypertextuality and implications for hypertext creators and audiences across genres
Content-Based Weak Supervision for Ad-Hoc Re-Ranking
One challenge with neural ranking is the need for a large amount of
manually-labeled relevance judgments for training. In contrast with prior work,
we examine the use of weak supervision sources for training that yield pseudo
query-document pairs that already exhibit relevance (e.g., newswire
headline-content pairs and encyclopedic heading-paragraph pairs). We also
propose filtering techniques to eliminate training samples that are too far out
of domain using two techniques: a heuristic-based approach and novel supervised
filter that re-purposes a neural ranker. Using several leading neural ranking
architectures and multiple weak supervision datasets, we show that these
sources of training pairs are effective on their own (outperforming prior weak
supervision techniques), and that filtering can further improve performance.Comment: SIGIR 2019 (short paper
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Retrieve-Cluster-Summarize: An Alternative to End-to-End Training for Query-specific Article Generation
Query-specific article generation is the task of, given a search query,
generate a single article that gives an overview of the topic. We envision such
articles as an alternative to presenting a ranking of search results. While
generative Large Language Models (LLMs) like chatGPT also address this task,
they are known to hallucinate new information, their models are secret, hard to
analyze and control. Some generative LLMs provide supporting references, yet
these are often unrelated to the generated content. As an alternative, we
propose to study article generation systems that integrate document retrieval,
query-specific clustering, and summarization. By design, such models can
provide actual citations as provenance for their generated text. In particular,
we contribute an evaluation framework that allows to separately trains and
evaluate each of these three components before combining them into one system.
We experimentally demonstrate that a system comprised of the best-performing
individual components also obtains the best F-1 overall system quality.Comment: 5 pages, 1 figure
How Intercultural is an “Intercultural University”? Some lessons from Veracruz, Mexico
Since the beginning of the 21st century, a new institutional figure starts to appear in the arena of Mexican higher education: the socalled “intercultural university”. What was first presented and conceived just as another link in the chain of preschool, primary and increasingly also post-primary schools “with an intercultural and bilingual approach”, created in and for the indigenous and multilingual regions of Mexico, now starts to have characteristics of a new uni- versity subsystem destined to provide an academic training which is supposed to be culturally relevant to students who are defined as diverse and different in ethnic, linguistic and/or cultural terms. In practice, this new educational offer is focused on students from indigenous regions who have been excluded from formal higher education and have had access only recently to complete basic education and also gradual access to upper secondary education. In this contribution, we briefly sketch the general tendencies that characterize this emerg- ing educational subsystem, before illustrating a case study which stems from a collaborative ethnography that we are conducting with one of the intercultural universities, the Universidad Veracruzana Intercultural (UVI), in order to finally draw some conclusions on the allegedly “intercultural” character of this new educational institution. Key words: intercultural education; intercultural university; collaborative ethnography; Veracruz
Entity relatedness for retrospective analyses of global events
Tracking global events through time would ease many diachronic analyses which are currently carried out manually by social scientists.
While entity linking algorithms can be adapted to track events that go by a common name, such a name is often not established in early stages leading up to the event. This study evaluates the utility of entity relatedness for the task of identifying related entities and textual resources that describe the involvement of the entity in the event. In a small study we find that simple relatedness methods obtain MAP score of 0.74 outperforming many advanced baseline systems such as Stics and Wiki2Vec. A small adaptation of this method provides sufficient explanations of entity involvement or 68% of relevant entities
Local and global query expansion for hierarchical complex topics
In this work we study local and global methods for query expansion for multifaceted complex topics. We study word-based and entity-based expansion methods and extend these approaches to complex topics using fine-grained expansion on different elements of the hierarchical query structure. For a source of hierarchical complex topics we use the TREC Complex Answer Retrieval (CAR) benchmark data collection. We find that leveraging the hierarchical topic structure is needed for both local and global expansion methods to be effective. Further, the results demonstrate that entity-based expansion methods show significant gains over word-based models alone, with local feedback providing the largest improvement. The results on the CAR paragraph retrieval task demonstrate that expansion models that incorporate both the hierarchical query structure and entity-based expansion result in a greater than 20% improvement over word-based expansion approaches
Inferring functional modules of protein families with probabilistic topic models
<p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p
- …